2025-12-02

Motivation

Non-linear regression

  • Given \(n\) data points \(\{x_i, y_i\}_{i=1}^n\)

Non-linear regression

  • Given \(n\) data points \(\{x_i, y_i\}_{i=1}^n\)
  • Predict \((x_{\texttt{new}}, y_{\texttt{new}})\)

Non-linear regression

Can all be done with Gaussian distributions!

Review of the Multivariate Normal (MVN) in 2-d

If we have a vector of random variables \(\mathbf{x}\) and

\[ \mathbf{x} \sim \mathcal{N}_d(\boldsymbol{\mu}, \mathbf{\Sigma}) \]

then, the joint probability mass of \(\mathbf{x}\) is given by the multivariate normal:

\[ p\left( \mathbf{x} \,|\, \boldsymbol{\mu}, \mathbf{\Sigma} \right) \propto \exp \left\{ -\frac{1}{2} (\mathbf{x}-\boldsymbol{\mu})^\top \Sigma^{-1} (\mathbf{x}-\boldsymbol{\mu}) \right\} \]

Review of the Multivariate Normal (MVN) in 2-d

A two-dimensional MVN with \(\boldsymbol{\mu}=[0,0]\) and \(\Sigma=\begin{bmatrix} 1 & 0.7\\ 0.7 & 1 \end{bmatrix}\):

# mean vector
mu <- c(0, 0)

# covariance matrix
Sigma <- matrix( c(1, 0.7, 
                   0.7, 1), nrow = 2) 

# grid of x1 and x2 values
x1x2_grid <- expand.grid(x1 = seq(-3, 3, length.out = 100), 
                         x2 = seq(-3, 3, length.out = 100))

# probability contours
probabilities <- dmvnorm(x1x2_grid, mean = mu, sigma = Sigma)

Review of the Multivariate Normal (MVN) in 2-d

Probability contour plot:

Review of the Multivariate Normal (MVN) in 2-d

Probability contour plot:

Conditioning of the MVN in 2-d

We can condition on one of the variables, \(p(x_2 \,|\, x_1,\,\Sigma)\)

Gaussian Processes

MVN in mulitple dimentions

\[ \mathbf{\Sigma} = \begin{bmatrix} 1 & 0.7\\ 0.7 & 1 \end{bmatrix} \]

MVN in mulitple dimentions

\[ \mathbf{\Sigma} = \begin{bmatrix} 1 & 0.7\\ 0.7 & 1 \end{bmatrix} \]

MVN in mulitple dimentions

\[ \mathbf{\Sigma} = \begin{bmatrix} 1 & 0.7\\ 0.7 & 1 \end{bmatrix} \]

MVN in mulitple dimentions

\[ \mathbf{\Sigma} = \begin{bmatrix} 1 & 0.7\\ 0.7 & 1 \end{bmatrix} \]

MVN in mulitple dimentions

\[ \mathbf{\Sigma} = \begin{bmatrix} 1 & 0.7\\ 0.7 & 1 \end{bmatrix} \]

MVN in mulitple dimentions

\[ \mathbf{\Sigma} = \begin{bmatrix} 1 & 0.7\\ 0.7 & 1 \end{bmatrix} \]

MVN in mulitple dimentions

\[ \mathbf{\Sigma} = \begin{bmatrix} 1 & 0.7\\ 0.7 & 1 \end{bmatrix} \]

\(\Sigma\) matrix in 10 dimensions

\[ \mathbf{\Sigma} = \begin{bmatrix} 1.00 & 0.90 & 0.67 & 0.41 & 0.20 & 0.08 & 0.03 & 0.01 & 0.00 & 0.00 \\ 0.90 & 1.00 & 0.90 & 0.67 & 0.41 & 0.20 & 0.08 & 0.03 & 0.01 & 0.00 \\ 0.67 & 0.90 & 1.00 & 0.90 & 0.67 & 0.41 & 0.20 & 0.08 & 0.03 & 0.01 \\ 0.41 & 0.67 & 0.90 & 1.00 & 0.90 & 0.67 & 0.41 & 0.20 & 0.08 & 0.03 \\ 0.20 & 0.41 & 0.67 & 0.90 & 1.00 & 0.90 & 0.67 & 0.41 & 0.20 & 0.08 \\ 0.08 & 0.20 & 0.41 & 0.67 & 0.90 & 1.00 & 0.90 & 0.67 & 0.41 & 0.20 \\ 0.03 & 0.08 & 0.20 & 0.41 & 0.67 & 0.90 & 1.00 & 0.90 & 0.67 & 0.41 \\ 0.01 & 0.03 & 0.08 & 0.20 & 0.41 & 0.67 & 0.90 & 1.00 & 0.90 & 0.67 \\ 0.00 & 0.01 & 0.03 & 0.08 & 0.20 & 0.41 & 0.67 & 0.90 & 1.00 & 0.90 \\ 0.00 & 0.00 & 0.01 & 0.03 & 0.08 & 0.20 & 0.41 & 0.67 & 0.90 & 1.00 \end{bmatrix} \]

MVN in mulitple dimentions

MVN in mulitple dimentions

MVN in mulitple dimentions

MVN in mulitple dimentions

MVN in mulitple dimentions

MVN in mulitple dimentions

MVN in mulitple dimentions

MVN in mulitple dimentions

MVN in mulitple dimentions

MVN in mulitple dimentions

MVN in mulitple dimentions

MVN in mulitple dimentions

MVN in mulitple dimentions

MVN in mulitple dimentions

MVN in mulitple dimentions

MVN in mulitple dimentions

Fixing points with the conditional distribution

In a simple example, imagine we are given two data points \(\{(x_1, y_1); (x_2, y_2)\}\). - We need to predict the function thereafter

Fixing points with the conditional distribution

Fixing points with the conditional distribution

Fixing points with the conditional distribution

Fixing points with the conditional distribution

Fixing points with the conditional distribution

Fixing points with the conditional distribution

Fixing points with the conditional distribution

Formal definition of Gaussian Processes (GPs)

A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely many variables.

A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

As a distribution over functions, a Gaussian process is completely specified by two functions:

  • A mean function, \(m(\mathbf{x}) = \mathbb{E}[f(\mathbf{x})]\) and
  • A covariance function, \(k(\mathbf{x}, \mathbf{x^*}) = \mathbb{E}\Big[\big(f(\mathbf{x})- m(\mathbf{x})\big) - \big(f(\mathbf{x^*})- m(\mathbf{x^*}\big) \Big]\)

\[ f(\mathbf{x}) \sim \mathcal{GP}\big(\, m(\mathbf{x}), k(\mathbf{x}, \mathbf{x^*}) \, \big) \]

Mathematical formalism: Regression

Generative model

\[ y(\mathbf{x}) = f(\mathbf{x}) \Big[ + \epsilon\sigma_y \Big]\\ p(\epsilon)=\mathcal{N}(0,1) \]

Place GP prior over the nonlinear function (mean function often taken as 0).

\[ p(f(\mathbf{x}) \, | \, \theta) = \mathcal{GP}\big(0, k(\mathbf{x}, \mathbf{x^*})\big)\\ k(\mathbf{x}, \mathbf{x^*}) = \sigma^2 \exp \left\{ -\frac{1}{2\ell^2}(x-x^*)^2 \right\} \]

Mathematical formalism: Predictions

\[ p(y_1, y_2) = \mathcal{N} \left( \begin{bmatrix} \boldsymbol{\mu_1\\ \mu_2} \end{bmatrix}, \begin{bmatrix} \mathbf{\Sigma}_{11} & \mathbf{\Sigma}_{12}\\ \mathbf{\Sigma}_{21} & \mathbf{\Sigma}_{22} \end{bmatrix}\right) \\ p(\mathbf{y}_1 \,|\, \mathbf{y}_2) = \frac{p(\mathbf{y}_1, \mathbf{y}_2)}{p(\mathbf{y}_2)} \]

With some involved proving:

  • Predictive mean: \(\boldsymbol{\mu}_{\mathbf{y}_1|\mathbf{y}_2} = \boldsymbol{\mu_1} + \mathbf{\Sigma}_{12}\mathbf{\Sigma}_{22}^{-1}(\mathbf{y}_2-\boldsymbol{\mu}_2)\)
  • Predictive variance: \(\mathbf{\Sigma}_{\mathbf{y}_1|\mathbf{y}_2} = \mathbf{\Sigma}_{11} - \mathbf{\Sigma}_{12}\mathbf{\Sigma}_{22}^{-1}\mathbf{\Sigma}_{21}\)

Optimization using Gaussian Processes

  • analytic expression for expected loss of evaluating y(x) under limited myopic approximation
  • consider multiple function evaluations into the future
  • Bayesian formalism allows estimation of confidence
  • Gaussian process allows incorporation of prior information
  • Learning from observations of derivative
  • Resolution of conditioning issues